I am a first year PhD student in Stanford Graduate School of Education, working with Professor Nick Haber in Stanford Autonomous Agents Lab. My research interests are broadly around curiosity, social learning and causal learning. Particularly, one of my research interests is to understand the different motivations behind curiosity, such as the motivation to gain optimal predictability (causal learning about the world) and to belong to a social group (a social motivation).
In this chosen paper, the first experiment and its result is relevant to my research interests. The result showed that people’s curiosity to reveal some unknown trivia questions’ answers, as a function of how confident they felt about answering these questions originally, could either be an inverted U-shaped curve or a decreasing line, depending on the distributions of how the future test questions were drawn. Therefore, people’s curiosity could be manipulated to peak either when they saw questions that were moderately complex, or questions that were the most novel, depending on the condition. In the confidence sampling condition, when participants knew that the test questions would be drawn based on the distribution of their confidence ratings, they would choose to reveal the questions that they didn’t have full knowledge of but not the least knwon questions before participants chose t proceed to the test. In the uniform sampling condition, when participants knew that the test questions would be drawn randomly in the test phase, they would choose to reveal the least known questions before continuing to the test phase. The stimuli used in the original study were 40 trivia questions on different topics, such as animals and books. They were originally published at Kang et al.(2009)’s first experiment.
I will need to build the web-based experiment for the first time and deliver it to online participants through Prolific. Potential challenges include implementing the experiment myself and learning the programming languages, the online experiment pipeline and the tools used in the process, such as learning Javascrip and PHP, Jspsych and Prolific. I was able to get the stimuli and most of the experiment code from the authors with requests.
Project repository (on Github): https://github.com/psych251/Dubey2020. Original paper (as hosted in your repo): https://github.com/psych251/Dubey2020/tree/main/original_paper.
It is hard to conduct a power analysis because the original study used a quadratic curve to fit the experiment data for the main result.
To be conservative in the replication study, the planned sample size is the same sample size as the original number of participants: 303. Will use the same demographics as the original study: adult participants (age >18) and living in the United States. Additionally, I will also choose participants who are fluent in English and balance the gender in the recruited participants.
The same materials will be used from the original study: “The stimuli used in the experiment were 40 trivia questions on various topics that were taken directly from Experiment 1 in Kang et al. (2009). According to the authors, these questions were designed to measure curiosity about semantic knowledge and evoke a range of curiosity levels.”
The same procedure will be used from the original study, except that the 40 questions will be changed to 20 questions to make the test shorter, given constraints:
"The experiment was divided into three phases – Phase 1, Phase 2, and Phase 3. In Phase 1, participants were shown 40 trivia questions one after another and were asked to rate their confidence (i.e., probability that they know the correct answer) and curiosity in knowing the correct answer. Curiosity ratings were on a scale from 1 to 7 and the confidence scale ranged from 0 to 100%. The order of trivia questions was randomized for each participant. Note that Phase 1 of our experiment followed the procedure of Kang et al.’s design closely. This part of the experiment took approximately 7-8 minutes to complete.
In Phase 2, all previous 40 questions from Phase 1 were again shown one after another and participants had the choice to reveal the answer to those questions. However, each time participants chose to reveal an answer, they had to wait an extra 10 seconds for the next question to appear.
Findings from Experiment 3 of Kang et al., (2009) showed that participants were more likely to spend time, to wait longer, for the answers that they were more curious about. Thus, requiring participants to spend time to obtain information served as a proxy to measure their curiosity.
In Phase 3, participants were shown 10 out of the previously shown 40 questions and were given the chance to answer these 10 questions. For each correct answer, participants were given a bonus of $0.08. To discourage participants from using Google or other search engines, they were only given 2 minutes in total to answer the questions.
At the beginning of the experiment, participants were randomly assigned to two conditions –the confidence and the uniform condition. These conditions differed in the way the 10 questions were sampled in Phase 3 and apart from that, the two conditions were exactly the same. In the confidence condition, the sampling in Phase 3, was done based on the confidence ratings provided by the participants i.e. the questions for which participant’s confidence rating was higher were more likely to appear in Phase 3. In the uniform condition, this sampling procedure was completely random i.e. each question was equally likely to appear in Phase 3. Importantly, participants were informed about the sampling procedure for their respective condition before the beginning of Phase 2. The confidence condition thus creates a situation in which confidence is related to probability of occurrence and the uniform condition breaks this relationship. To ensure that participants understood the sampling procedure, participants had to answer a multiple-choice question about the sampling procedure after the instructions for Phase 2 were shown to them. If they gave an incorrect answer, participants were shown the instructions again and had to re-answer the question (this process was repeated until they answered the question correctly)."
I will use the same exclusion criteria as the original paper: participants who revealed either none or all of the answers in Phase two will be excluded.
For analysis: “Following Kang et al.’s (2009) methodology, the raw curiosity ratings were individually normalized and confidence was rescaled to range from 0 to 1, and we fitted the data to the equation, curiosity = b0 + b1 x c + b2 x c x (1 - c), to both the conditions, where c was the rescaled confidence score…(In Phase two), we first computed the probability of participants revealing an answer conditioned on the confidence rating for both the conditions…Similar to the previous analysis, we fitted data to the equation…to both conditions.”
Clarify key analysis of interest here
The key analysis of interest is to see whether the data from two conditions will fit to the quadratic function differently, where the data from the confidence condition will generate a significant coefficient for the quadratic term (b2 estimate = 0.6, p < 0.05 in the paper) and an insignificant coefficient for the liner term (b1 estimate = -0.03, p = 0.5). This means a U-shaped (quadratic) relationship exist for data in this condition. On the contrary, the data from the uniform condition generated an insignificant coefficient for the quadratic term (the b2 estimate was not reported, but p = .34) and a significant coefficient for the liner term (the b1 estimate = -0.27, p < .05). This implies a decreasing relationship for data in this conditon.
Explicitly describe known differences in sample, setting, procedure, and analysis plan from original study. The goal, of course, is to minimize those differences, but differences will inevitably occur. Also, note whether such differences are anticipated to make a difference based on claims in the original article or subsequent published research on the conditions for obtaining the effect.
None.
Due to financial constraints, I collected data of 247 participants (124 in the confidence condition, and 123 in the uniform condition). With the same exlusion criterion in the paper (eliminating bad participants who either revealed all or none of the answers of the questions), 11 participants are exluded from the confidence condition and 15 participants are excluded from the uniform condition. The final sample size becomes 221 (113 of them remained in the confidence condition and 116 of them remained in the uniform condition).
None.
Data preparation will be in MATLAB code. If I have spare time, I will rewrite these code into R code.
The analyses as specified in the analysis plan.
Original Study Plot(on paper) - Original Study Plot(from code)
Pilot A Plot - Pilot B Plot - Final Data Plot
Original Significance Test Result - Pilot B Result - Final Result
Due to financial constraints, I set up two separate Prolific accounts to separately collect data from two conditions. Therefore, some participants participated twice in both conditions, despite the specified instructions asking participants to not take the part two study if they have completed part one.
For the exploratory analysis, I excluded participants’ second attempt for those who participated twice. Here are the results.
Participants_Without_Repetition_Plot - Participants_Without_Repetition_Significance_Test
If there is additional time, I hope to conduct 1) model explorations to see if other models such as two linear models can fit the data in the confidence condition, according to the method described here: http://datacolada.org/27. 2) analyze topic curiosity to see if there are across participants’ curiosity (tendency to reveal answers) for specific topics. 3) see the correlation map between participants’ curious ratings and participants’ confidence ratings in Phase 1.
4) see within-subject participants’ data across two conditions.
In final data’s model fitting results, the confidence condition generates significant coefficients for both the quadratic term and the linear term (b2 estimate = 0.5, p <0.05; b1 estimate = -0.17, p < 0.05).The uniform condition generates a significant coefficient for the linear term (b1 = -0.17, p < 0.05) and an insignificant coefficient for the quadratic term (b2 estimate = 0.4, p = 0.056).
In final data (without repeated participation)’s model fitting results, the confidence condition generates significant coefficients for both the quadratic term and the linear term (b2 estimate = 0.49, p <0.05; b1 estimate = -0.17, p < 0.05). The uniform condition generates a significant coefficient for the linear term (b1 = -0.22, p < 0.05) and an insignificant coefficient for the quadratic term (b2 estimate = 0.4, p = 0.052).
Remember, the original paper’s numbers are: In the confidence condition, b2 estimate = 0.6, p < 0.05, and b1 estimate = -0.03, p = 0.5. In the uniform condition, b2 estimate was not reported, p = .34 and b1 estimate was -0.27, p < 0.05.
Based on the above results, I think the study failed to replicate the original result.
Further exploratory analyses as specified above would be helpful to gain further insights about participants’ decisions and behaviors. Additionally, further analyses of the original dataset would be helpful too, to see what is the number of participants who rated the questions at zero or full confidence, if that’s a small number of people and whether a few outliers who decided to reveal the answers in confidence = 0, and less at confidence = 1 (more rational behaviors) made the uniform condition more linearly decreasing.
The study result also showed that participants are not fully rational-in the uniform condition, they didn’t reveal the answer they rated at confidence = 0 all the time. I wonder if potential barriers are memory (not remembering which specific question they rated at zero confidence, therefore adding noise), time-constraint of the study (so participants felt rushed to finish the experiment without too much time thinking), or intrinsic motivation to reveal questions still, despite their full confidence of the question’s answer.Future studies that disentangle these potential variables can show the effect more clearly.